Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Communication optimization for intermediate data of MapReduce computing model
CAO Yunpeng, WANG Haifeng
Journal of Computer Applications    2018, 38 (4): 1078-1083.   DOI: 10.11772/j.issn.1001-9081.2017092358
Abstract376)      PDF (1014KB)(356)       Save
Aiming at the communication problem of crossing the rack switches for a large amount of intermediate data generated after the Map phase in the MapReduce process, a new optimization method was proposed for the map-intensive jobs. Firstly, the features from the pre-running scheduling information were extracted and the data communication activity was quantified. Then naive Bayesian classification model was used to realize the classification prediction by using the historical jobs running data to train the classification model. Finally, the jobs with active intermediate data communication process were mapped into the same rack to keep communication locality. The experimental results show that the proposed communication optimization scheme has a good effect on shuffle-intensive jobs, and the calculation performance can be improved by 4%-5%. In the case of multi-user multi-jobs environment, the intermediate data can be reduced by 4.1%. The proposed method can effectively reduce the communication latency in large-scale data processing and improve the performance of heterogeneous clusters.
Reference | Related Articles | Metrics